An Estonian Morphological Analyser and the Impact of a Corpus on Its Development
نویسنده
چکیده
The paper describes a morphological analyser for Estonian and how using a text corpus influenced the process of creating it and the resulting program itself. The influence is not limited with the lexicon only, but is noticeable in the resulting algorithm and implementation too. When work on the analyser started, there was no computational treatment of Estonian derivatives and compounds. After some cycles of development and testing on the corpus, we came up with an acceptable algorithm for their treatment. Both the morphological analyser and the speller based on it have been successfully marketed.
منابع مشابه
Developing a syntactic analyser for Estonian
The aim of the present article is to give an overview of the current state of syntactic analysis of Estonian and describe problems that were encountered in the generation of syntactic rules for the syntactic analyser of Estonian. So far only the rules based on linguistics have been used. This article is focused on the statistical methods in syntactic analysis and it describes the experiments of...
متن کاملDesigning a Speech Corpus for Estonian Unit Selection Synthesis
The article reports the development of a speech corpus for Estonian text-to-speech synthesis based on unit selection. Introduced are the principles of the corpus as well as the procedure of its creation, from text compilation to corpus analysis and text recording. Also described are the choices made in the process of producing a text of 400 sentences, the relevant lexical and morphological pref...
متن کاملExperimental Bootstrapping of Morphological Analysers for Nguni Languages
This paper addresses the experimental bootstrapping of the development of broad-coverage finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by using an existing prototype of a morphological analyser for Zulu. These languages are both morphologically complex and resource-scarce. The research question is whether bootstrapping is feasible across the language boundaries be...
متن کاملThe Estonian Reference Corpus: Its Composition and Morphology-aware User Interface
This paper gives a brief overview of the composition as well as technical and morphological annotation of the Reference Corpus of Estonian. A user interface using the morphological information about lemmas and grammatical categories of word-forms is presented.
متن کاملFinite-state Relations Between Two Historically Closely Related Languages
Regular correspondences between historically related languages can be modelled using finitestate transducers (FST). A new method is presented by demonstrating it with a bidirectional experiment between Finnish and Estonian. An artificial representation (resembling a protolanguage) is established between two related languages. This representation, AFE (Aligned Finnish-Estonian) is based on the l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computers and the Humanities
دوره 31 شماره
صفحات -
تاریخ انتشار 1997